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Abstract 

Elementary but general statistical analyses determine the uncertainty arising from 
photon statistics in measuring a line shift and width. Account is taken of a background 
as well as the required signal. 

1 Introduction 

In many spectroscopic situations, the shift and width of a spectral line is desired. The shift is 
most often wanted as a measure of the mean velocity of the emitters in the line of sight, and 
the width as their Doppler temperature. The statistical uncertainty of such a measurement 
can be written down extremely simply, provided the problem is approached in the right way. 

If there is present, in addition to the line signal of interest, a background signal, which 
is to be subtracted from the signal, this complicates the problem somewhat. Nevertheless, 
a simple result is obtained. The way that the instrumental spectral line shape affects the 
result can also be trivially included. 

Although the statistical analysis set forth here is elementary, it does not appear in a 
usable form, to the author's knowledge, in the standard textbook discussions of uncertainty. 
Nor are most students or researchers familiar with it. Therefore it seems useful to set it forth 
in a pedagogic style. The required mathematical results will be cited from an introductory 
text bookpQ as references to its numbered theorems. 

Classical statistics (uncorrelated photons) rather than Bose-Einstein statistics will be 
assumed throughout, as is an excellent approximation in most experimental situations. 



2 Line Shift 



2.1 Perfect Spectrometer, Zero Background 

We suppose initially that a perfect spectrometer is available, which measures the exact 
wavelength A of every photon arriving. The line shape is taken to have the form /(A), where 
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I is the probability distribution function of photon wavelength. That is, the probability of 
any photon having wavelength in the range (A, A + dX), in the limit of small dX, is I(X)dX. 

A measurement consists of collecting photons for a certain time interval. We shall ap- 
proximate this as being to collect a fixed total number of photons. Provided the number 
of photons in the sample, N, is large, this will introduce negligible error, at least in the 
line position. The formulation of our problem of determining the uncertainty in a line-shift 
measurement is then to determine what is the probability distribution (or at least its width) 
for the line position deduced from the sample of N independent photons. There are, of 
course, many complicated possible ways of determining the line position of the photon sam- 
ple. However, provided that these methods are not able to take advantage of important a 
priori information about line shifts, they will have no intrinsic advantage over the simplest 
measure of line position, namely the centroid of the photon wavelength distribution: 

1 N 

JV i=l 

The index k here is used to refer to the kth measurement (or sample), and i refers to the 
photon number within that sample. 

The advantage of using the centroid as the line position metric, is that the distribution 
of /u,k is the subject of the Central Limit Theorem of statistics ([1] theorem 12.1), which 
states that if the standard deviation of the population /(A) from which the sample is drawn 
is a j, then in the limit of large N, the values of /i are distributed with normal (Gaussian) 
distribution, with standard deviation 

^ = (n/y/N. (2) 

Note that this result is regardless of the shape of the original distribution /(A). 

Equation fl2]) gives the required uncertainty in the measured line-shift. It is the width of 
the line to be measured divided by the square root of the number of photons in the sample. 
Moreover the Central Limit Theorem also tells us the bonus that the distribution of \x is 
Gaussian. 

2.2 Perfect Spectrometer, Background Subtraction 

Now we consider a situation in which we know that there are, in addition to the photons 
we care about, background photons, which we will have to subtract from our spectrum. 
Suppose the background spectrum to be given by a probability distribution B(X) (so that 
/ B(X)dX = 1), having a standard deviation (spectral width) o B and a centroid /xg. 

A measurement consists of obtaining a large number of photons, each of whose wavelength 
is measured. Of those photons, Nj are the signal photons, and N B are the background 
photons, N = Nj + Nb- We don't know which photon is which, but for purposes of reference, 
we will suppose them to be ordered such that the first Nj are the signal photons. We will 
assume initially that the numbers iVj and Nb are known and fixed. 
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Our best estimate of the background is that its spectral photon density is N B B(\), so 
that the background contribution to the centroid sum £)A is on average N B n B . We will 
subtract this background from the spectrum and then obtain the centroid of the remaining 
"signal" . Therefore the estimate we should use for the signal photons' centroid is 

I ( N \ 1 ( Nl N \ 

to = - ^^-% B hTA 4 + £ Xi-N Bf i B ). (3) 

7V ' \i=l I N I \i=l Nj+l J 

The second equality in eq ([3]) is written to show that if we had a perfect knowledge of the 
background contribution, the last two terms exactly cancel, returning us to our original form 

©• 

However, the two terms do not exactly cancel, even if our knowledge of B(X) is perfect, 
because of the statistics of the background photons. We know that the centroid of the 
background photons in the sample, 

1 N 



^Bk = ^ r J2 A *> ( 4 ) 

i=Nl+1 



is distributed with Gaussian probability distribution with mean fi B and standard deviation 
<j B /y/N B . Therefore our line position estimate is the sum of three terms. The first is the 
centroid of the signal photons in the sample, distributed with standard deviation ai/y/Ni. 
The second is the centroid of the background photons in the sample, \i B k times N B /Nf, and 
the last is a constant, which we chose to annihilate the mean of \i B k- 

Utilizing the theorem that a random variable that is the sum of two other independent 
random variables has a distribution whose mean is the sum of the means ([1] Property 7.4), 
and whose variance (standard deviation squared) is the sum of the variances (Property 7.9), 
we can immediately deduce that the distribution of our line shift estimate has mean equal 
to the centroid of /(A) and standard deviation: 



aj a 2 B N B 



This important result shows how rapidly a background can come to dominate the un- 
certainty in the measurement of line shift. If 07 ~ a B , then as soon as N B > Nj it is the 
second term, coming from statistics of the background photons, that determines the shift 
uncertainty. 

Notice also, that if the background spectrum is broad, for example flat, in the vicinity of 
the signal line, then the effective a B is determined purely by the spectral band chosen over 
which to sum the photons. To minimize this width we should choose a spectral band that is 
just wide enough to encompass the signal line, but no wider. In that case, the background 
width will be wider than the actual line spread by a roughly factor of 2 or 3, depending on 
the exact band width chosen. 
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In the contrasting case where the background line mimics the signal line in width so that 
&i = <?b, the resultant uncertainty is 



a, = ^Ly/l + Nb/Nj. (6) 



Either way, when N B ^> Nj, the uncertainty becomes 



Since both of the random contributions to /i& are normally distributed (by the Central 
Limit Theorem), and the distribution of a sum of normally distributed random variables is 
normally distributed ([T] Theorem 10.6), itself is also normally distributed. 

2.3 Offset Background: Background-fraction Fluctuations. 

If the background distribution is substantially shifted in its mean from the mean of the signal, 
an additional effect may be non-negligible. We must account for the statistical uncertainty 
in the fraction of sample photons that is actually background, rather than signal. In other 
words, although we will assume we know the average number of signal and background 
photons, Nj and N B , we do not know the actual numbers for any specific sample, Nik and 
Nsk- Therefore, our expression for the estimate of the signal mean, eq ([3]), must more 
precisely be written 

1 f N \ 1 ( Nlk N \ 

^ = F N B im =17 EAfi+ E X Bt- N B fi B , (8) 

Mi w / Mi y i=1 Nik+1 j 

where we explicitly denote with a subscript those photons that are signal and background. 
Grouping terms appropriately we then write 

I I Ni N Ik N Ik N \ 

fJ-ik = irr E hi + E hi - E hi + E hi ~ N B [i B , (9) 

\i=l i=ATj+l Afj+1 ATj+1 / 

which is precisely the previous expression, eq fl3]), plus the combination 



E 




7^1 E hi - Ehi). (io) 



This expression is zero if Nik — Ni, but for fixed difference Nik ~ Mi has expectation 
{N ik — N i)(fii — fib) I 'N i . To lowest order in 1/N, this additional term is therefore distributed 
as fii — fi B times the fractional deviation of Nik from iVj. The standard deviation of Nik/N 

is ^JNiN b /N 3 . So the additional term has standard deviation (/// — //&) \JNb/(NiN). 
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Consequently the standard deviation of the estimate of the line position (eq |5]) is gener- 
alized to 



°i , (vi-Vb) 2 N b a%N B 
\Nj + NjN + Nf ' 1 J 

Recalling the considerations in the previous subsection, the wavelength collection region 
must be chosen approximately centered on fij and with (half) width no more than 2-3 times 
a i. As a result the mean collected background photon wavelength cannot deviate more than 
2 — 3<7f from — jj, B \ ^ 2a - / . The additional (center) term in eq ( ITT]) is most significant 

when N B ~ Nj because if Nr <C Nj, it is negligible compared with the first term and if 
Nj <C N B it is negligible compared with the last. Thus the worst case is that the extra term 
increases the uncertainty arising from the signal width term by about a factor of 2. More 
typically, \fij — hb\ < &i, in which case the additional term is essentially negligible. 



2.4 Imperfect Spectrometer or Complex Line Shape 

In practice a spectrometer has a finite instrumental line width. Its effects can often be 
described by an instrument line shape S(X), such that the observed spectrum is the convo- 
lution of S(X) with the incident spectrum. Consequently a perfectly monochromatic line, 
/(A) = 5(\ — Ao), acquires the instrumental shape S(\ — Ao). Because the instrument shape 
can be taken to have zero centroid shift, the convolution introduces no systematic shift of 
the line, nor, more importantly, any random shift. What it does, however, is to broaden the 
width of the observed line, relative to what would otherwise have been observed. 

From the point of view of statistics, all of the previous arguments still apply. But they 
apply to the instrumentally broadened observed line rather than to the original line shapes. 
In particular all of the foregoing formulas will still apply if the a factors are taken as the 
widths of the lines after convolving with the instrumental function. In other words, if the 
observed widths are used. 

By the same token, if the lines being examined have complex line shape due, for example, 
to a multiplet structure, then all of the foregoing formulas still apply. But the width of the 
lines is then perhaps dominated by the multiplet structure rather than the other broadening 
mechanisms. 



2.5 Finite-width Wavelength Bins 

A situation of importance for modern spectrometers, is when the instrument does not simply 
produce a convolution of the line with an instrumental line-shape. Instead, one might have a 
series of bins of finite-width into which the photons are sorted. This will occur, for example, 
with a multi-element detector when each of the different elements corresponds to a different 
wavelength bin. It is clear that if the bin width is much less than the observed width of the 
line, then the binning will amount simply to a discrete approximation to the integrals and 
sums invoked above. 
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However, if the bin size is larger than the line width, then strongly non-linear effects on 
the shift will be introduced. For example, as a narrow line moves across a wider bin, no 
detectable change in the spectrum occurs until the line approaches the boundary between 
bins. Then as it crosses the bin boundary, a step in apparent position of the photons occurs 
from one bin to the next. 

If the bins are of equal width, perhaps the best way to see the effect of the bins is as 
follows. We deduce the centroid by assigning each photon in a bin a wavelength equal to the 
center of the bin; this is the most reasonable unbiassed estimate. Let the center of the j th 
bin be at wavelength Xj = jAX (measured from a convenient zero). Then the error in the 
centroid for a photon at position A lying in the j th bin is A — Xj, which is a linear ramp. If 
the photon lies in the next bin j + 1, the error has an identical ramp (A — Therefore 
the error function has the form of a sawtooth: 

S(X) = A - Xj, for (Xj + A i _i)/2 < A < (X j+1 + Xj)/2 . (12) 

And a line shape /(A) acquires an error 

J I{X)E{X)dX . (13) 

Now S is a periodic function, and can thus be expressed as a Fourier series 

oo 

S = A m sin(2vrmA/AA) . (14) 

m=l 

When this is substituted into eq (1151) . we see immediately that the error arising from a line 
shape /(A) can be written as a weighted sum of the Fourier transform of 1(A) evaluated at 
the (spatial) frequencies 27rm/AA. The most important error will arise from the fundamental 
m=l, both because A m is a decreasing function of m and because the Fourier transform 
will be a decreasing function of frequency. Indeed if /(A) is a Gaussian, then its Fourier 
transform is also a Gaussian, whose width is ~ I/07. Provided that 27r/AA is substantially 
larger than this width, the binning error will become negligibly small. This condition is 
equivalent to the requirement that there be more than a few bins covering the width of the 
line. 

Thus, somewhat counter-intuitively, if we have a binned spectrum, it is advantageous, 
from the viewpoint of line position measurement, that the effective line-shape (the convolu- 
tion of the received line with the instrument function) should be significantly broader than 
the bin width. It is actually a substantial disadvantage to have a line narrower than approx- 
imately the bin width. For that reason, a detector array ought always to be spaced closer 
than the instrumental resolution. 

Moreover, the naive idea that the bin size represents a minimum resolution, and thus a 
minimum line-shift resolution, is false. For a Gaussian line, the Fourier transform decays so 
rapidly at a few times its width that it becomes completely negligible. When that occurs, 
if there are enough photons in the line, the uncertainty given by eq (J3D may be far smaller 
than the bin width. 
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3 Line Width 



3.1 Line Width, Zero Background 

Generally speaking, the line width can be analysed in a way comparable to the shift. How- 
ever, the elementary statistical theorems are not quite as general. 

The line width can be defined in terms of the second moment of the sample. The usual 
unbiassed statistical estimate for the width (07) of the population, based on a sample of size 
N, is Sk, the square root of the sample variance given by: 



^tt^tE^-^) 2 (15) 



If /(A) is a Gaussian distribution, then by Theorem 10.8jT], the variable u = S 2 (N — l)/cr|, 
which is the sum of N independent normally distributed variables, has a chi-squared distri- 
bution of N degrees of freedom. In other words, its probability distribution function is 

1 ; T(N/2)u N / 2 - 1 e- u / 2 (16) 



2 N/2 

In particular, since that chi-squared distribution has mean N and variance 2N, the variance 



of S 2 is 2N x [aj/N] 2 = aj2/N, and the standard deviation of S 2 is aj^/2/N. 

For large N (by [1] Theorem 9.2) if u is distributed as a chi-squared distribution of N 
degrees of freedom, then the variable \/2u has approximately a normal distribution of mean 
y/2N - 1 and variance unity. Thus for large N the distribution of S is Gaussian with mean 

(17) 




and standard deviation 



2(N-1) 

^= ■ (18) 
2(iV-l) 

The value as, eq (118j) . can be considered the uncertainty in determination of the line 
width, when the width is evaluated by using the second moment of the line shape. As 
before, its should be applied to the observed line width (including instrumental broadening). 



3.2 Line Width, Background Subtraction 



The natural unbiassed estimate of the line width in the presence of background photons is 
Sk, where 

' E( A * - - N B {a 2 B + ( m - fi B ) 2 } (19) 

.1 = 1 



S U 



Nj - 1 
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where is given by eq (j3J). This can be written 

A 7 / N 

x, - ~ 

■2 



S\,. 



1 iV/ JV 

— T [E(^-^) 2 + E(^-M 2 

^ 1 i=l Wi+l 

-A^ B {a| + (jjt Ik - /i B ) 2 }] 

1 JVj AT 

- L T [E(^-^) 2 + E((^ 
' _ 1 i=i 



A 



E 2Ai(/i 7fc - n B ) + iV B (//f fe - //|) 

AT r +l 



(20) 



A 



iVj+1 



cr|} + 2(/X B - H Ik ) E (^i - 
Ni+1 



The second and third sums in this expression come from the statistics of the background 
photons. The variances of these two expressions are given by: 

1 N 2rx 4 

2 °*B}) = lPh (21) 



Var( 



N B - 
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1 JVj+l 



and 



Var (^j 1 — E ( A * - M) = : • (22) 

1\ B - 1 Nj+1 ^B - 1 

Assuming that the variance of S 2 can be written as the sum of the variances of these three 
terms (which is not obvious since they aren't independent) we get 

2 



Var(S 2 k 



2aj 
iVj - 1 
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N, 



+ 



B 



N, 



B 



(23) 



Let us drop the irrelevant small distinction between N — 1 and N. Then this expression 
shows that the background terms will dominate if (iV B / 'N 1)0% > a i- The third term will give 
an additional contribution unless we can arrange that the background spectrum is centered 
on the signal line to better than <tb', but that should be easy to accomplish; so we will ignore 
the third term. Then 

2a%N B 



Var(S k 



2a\ 



Nf 



(24) 



And of course the expectation (mean) of S 2 is a 2 . 

Provided the distribution of S k is narrow, we can regard its distribution as centered at 
fi S 2 with a much smaller width (752. Then the square-root variable S, can be expanded 
schematically as 



S 



'S 2 



y/UsP + a s i « + 1 a S 2//j, S 2) (25) 

which shows that the mean of S is y/ns 2 but the standard deviation of S is ~ \ a s 2 1 \/^s 2 - 
Applying this to the result of eq ( l2~4"j) . we find that the standard deviation of our estimate S 
of the line width is 



0"5 



2ai\ N 



2a\ 2a A B N B 



<Ji 



N 2 



1 



°%Nb 
ajNj • 



(26) 



This result is consistent with eq (11 8p . which was obtained using more specific distribution 
assumptions, but without the approximations made here. 
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4 Summary 



When line shift and width are measured using a large sample of Nj signal photons from a 
line whose width is 07 in the presence of N B background photons from a population centered 
on the signal line but with width a B , the photon statistical uncertainties in the shift and 
width of the signal line, deduced from the moments of the distribution, are respectively: 



07 



W7\ 



1 + 



oINb 



(27) 



and 



01 



1 + 



a%N B 



(28) 



In these expressions, the observed widths of the line and background, including instrumental 
or multiplet broadening effects, should be used. 
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